Basic idea

We want to test the sensitivity of numerical outputs of MERA. In particular we are interested in:

  1. The absolute value an MP converges to. If, running MERA 5,000 times we get that this MP has probability of overfishing of 78%, what shall our confidence bound on that prediction actually be?
  2. The relative value of one MP versus another. If, after running MERA 5,000 times one MP has 5% lower chance of overfishing compared to an alternative, how much safer is the first MP really?

Limitations

We had to run the model many times in a flexible way so we just used the DLMtool package for the most part. This means that we varied parameters that may not be in the questionaire or that are distributed differently than in MERA. We also did not run this analysis on a calibrated model.

We can solve these limitations once we have a full indonesia model for both and we have looked at the policy conclusions in a bit more detail.

First numerical example

We use the “snapper” OM from the library (“South Atlantic Red Snapper”). We compare two policies: matlenlim and ITarget1. We run 5,000 MSE. The result look like a simple PNOF-LTY tradeoff. By 5,000 MSE the recommendations have converged quite precisely to the percentages in the table.

MP PNOF P50 AAVY LTY
Itarget1 0.83 0.91 0.91 0.76
matlenlim 0.77 0.92 0.37 0.80

Now, that number that we converge to is strictly speaking the expected value conditional on the priors we have on all the parameters of the model. In MERA this would mean conditional on the answers of the questionaire and default values for anything else. This is rather obvious of course (the questions matters) but that also means that convergence alone doesn’t actually tell us much about sensitivity and uncertainty. What we need to check is how much these answers would change if our priors changed a tiny bit.

We do two sets of experiments here. 1. Bound sensitivity: What if the boundaries of the priors of various parameters were slightly off in one direction or the other ($$10%) 2. Shape sensitivity: What if the boundaries of the priors were specified correctly, but the priors were not uniform and rather triangular in some form?

Snapper OM has 37 parameters with some form of distribution on them. This is what we vary:

##  [1] "M"            "Msd"          "h"            "Perr"         "AC"          
##  [6] "Linf"         "K"            "t0"           "LenCV"        "Ksd"         
## [11] "Linfsd"       "L50"          "L50_95"       "D"            "Size_area_1" 
## [16] "Frac_area_1"  "Prob_staying" "Esd"          "qinc"         "qcv"         
## [21] "Cobs"         "CAA_nsamp"    "CAA_ESS"      "CAL_nsamp"    "CAL_ESS"     
## [26] "Iobs"         "Btobs"        "Btbiascv"     "beta"         "Dobs"        
## [31] "Recbiascv"    "TACFrac"      "TACSD"        "TAEFrac"      "TAESD"       
## [36] "SizeLimFrac"  "SizeLimSD"

The basic result is that, for this example, the variation is quite large when changing bounds, but more so when changing shapes:

 The tradeoff plot between probability of not overfishing (y axis) and long term yield (x axis). When allowing either bounds or shapes of the priors to vary, the crisp tradeoff between the two policies is lost. Each dot represents the average values obtained after running the MSE simulation 500 times and thereby achieving convergence. Each separate dot in the sensitivity facets is obtained by a separate, random shock to the priors as described in the sections below. In total there are 1494 simulations for bound sensitivity and 2129 for shape sensitivity

Figure 1: The tradeoff plot between probability of not overfishing (y axis) and long term yield (x axis). When allowing either bounds or shapes of the priors to vary, the crisp tradeoff between the two policies is lost. Each dot represents the average values obtained after running the MSE simulation 500 times and thereby achieving convergence. Each separate dot in the sensitivity facets is obtained by a separate, random shock to the priors as described in the sections below. In total there are 1494 simulations for bound sensitivity and 2129 for shape sensitivity

There is a third test I think we could do, which implies some form of correlation between parameters (so joint distribution) but the space of it is so large that I still have to figure out a way to do it that is not crazy.

Uniform sensitivity

Because of central limit theorem, PNOF and LTY will always converge. But what will they converge to if the bounds are off by small amounts (AT MOST +-10%)? In MERA this could happen because we set, say, the boundary between extremely depleted and very depleted at 3% rather than 2.7%.

What we do here is that we re-run the same MSE by shocking the bounds of each prior (all 37 of them, a separate shock for each) by the random variable \(\epsilon \sim U[.9,1.1]\). Notice that we shock upper and lower bound by the same number so that we ignore the case where boundaries shrink on one side and increase on the other.

We do this 1494 simulations; 465 MSEs had to be discarded because even these minor boundary changes resulted in inconsistent operating models ( where the parameter combination failed to achieve desired depletion).

In summary, just by random search: * it is often possible to find cases where the MPs converges at numbers quite far from the original value * ranking between policies is quite consistent: when you pair them, Itarget1 always has better PNOF

Histogram representing 1494 pairs of probability of not overfishing (PNOF) and long-term yield (LTY) achieved by the two chosen management procedures, where each pair is produced by running 500 MSE simulations each with a different shock (between -10% and +10%) to the bounds of each prior distribution of unknown parameters. For both management procedures, it is possible to find examples where these small shocks to the priors cross the PNOF and LTY 80% thresholds.

Figure 2: Histogram representing 1494 pairs of probability of not overfishing (PNOF) and long-term yield (LTY) achieved by the two chosen management procedures, where each pair is produced by running 500 MSE simulations each with a different shock (between -10% and +10%) to the bounds of each prior distribution of unknown parameters. For both management procedures, it is possible to find examples where these small shocks to the priors cross the PNOF and LTY 80% thresholds.

Histogram representing the difference, within the same MSE simulation, between probability of not overfishing (PNOF) and long-term yield (LTY) achieved by the two management procedures expressed as point percentage distance between what is achieved by `matlenlim` and what is achieved by `Itarget1`. It is possible to find at least one example where ranking is reversed for both PNOF and LTY but for the majority of simulations the ranking is maintained. Each element in the histogram is generated by running 500 MSE simulations each with a different shock (between -10% and +10%) to the bounds of each prior distribution of unknown parameters.

Figure 3: Histogram representing the difference, within the same MSE simulation, between probability of not overfishing (PNOF) and long-term yield (LTY) achieved by the two management procedures expressed as point percentage distance between what is achieved by matlenlim and what is achieved by Itarget1. It is possible to find at least one example where ranking is reversed for both PNOF and LTY but for the majority of simulations the ranking is maintained. Each element in the histogram is generated by running 500 MSE simulations each with a different shock (between -10% and +10%) to the bounds of each prior distribution of unknown parameters.

Triangle sensitivity

We probably have a better grasp of bounds than we do of the the shape of the priors. After all that’s what we ask in MERA, it would be very hard to ask for confidence within sub-intervals. What if the shapes assumed for the priors are off? For example, something that is assumed uniform has instead more weight in certain part of the distribution. For example, in 712 we might put a quite large bound on acceptable depletion, but we’d expect it to be more likely to be heavily depleted.

Basically we can’t try all prior shapes in the world so let’s focus only on triangles: * common in surveys (what’s the smallest, the largest, the most common?) * strictly respect original bounds * it’s only one shape parameter (the mode of the distribution) for each free parameter in DLM distribution (because we fixed the boundaries)

For each simulation we sample a random vector \(x\) of size 37 where each element \(x_i \sim U[0,1]\) representing the mode of the triangle distribution of that parameter (0 means the mode is the original lower bound, 1 means the original upper bound). The upper/lower bounds are the original ones and we don’t touch them. For each vector \(x\) and resulting collection of triangle priors we run MSE and collect their convergent PNOF and LTY. We do this 2129 times (no runs had to be discarded)

This particular simulation is more sensitive to shocks in shape. It is possible for example to find triangles where LTY is below 60% or above 95% for both strategies. Rankings are still more robust but now it is possible to find some triangular priors where matlenlim strategy has 10 percentage points advantage over iTarget1 and for others a disadvantage of similar magnitude.

Histogram representing 2129 pairs of probability of not overfishing (PNOF) and long-term yield (LTY) achieved by the two chosen management procedures, where each pair is produced by running 500 MSE simulations each with a different set of triangular priors (all within original bounds) with varying mode. For both management procedures, it is possible to find examples where these small shocks to the priors cross the PNOF and LTY 80% thresholds (as well as the 70% and 90% ones).

Figure 4: Histogram representing 2129 pairs of probability of not overfishing (PNOF) and long-term yield (LTY) achieved by the two chosen management procedures, where each pair is produced by running 500 MSE simulations each with a different set of triangular priors (all within original bounds) with varying mode. For both management procedures, it is possible to find examples where these small shocks to the priors cross the PNOF and LTY 80% thresholds (as well as the 70% and 90% ones).

Histogram representing the difference, within the same MSE set, between probability of not overfishing (PNOF) and long-term yield (LTY) as point percentage distance between what is achieved by `matlenlim` and what is achieved by `Itarget1`. It is possible to find  priors where `matlenlim` achieves LTY 10 percentage points above Itarget as well as priors where the opposite is the case. Similarly `matlenlim` can achieve both higher and lower PNOF than Itarget1 depending on the changes in the mode. Each element in the histogram is generated by running 500 MSE simulations each with a different set of triangular priors (all within original bounds) with varying mode.

Figure 5: Histogram representing the difference, within the same MSE set, between probability of not overfishing (PNOF) and long-term yield (LTY) as point percentage distance between what is achieved by matlenlim and what is achieved by Itarget1. It is possible to find priors where matlenlim achieves LTY 10 percentage points above Itarget as well as priors where the opposite is the case. Similarly matlenlim can achieve both higher and lower PNOF than Itarget1 depending on the changes in the mode. Each element in the histogram is generated by running 500 MSE simulations each with a different set of triangular priors (all within original bounds) with varying mode.

What should we learn about this?

At the end of the day with priors-only simulations, priors will matter. The problem is not that sensitivity makes the output invalid, quite the contrary. The approach to turn a deep uncertainty into a probabilistic one like MERA does is still very valid and powerful. The point is that rankings by one draw of the prior space (because fundamentally that’s what we converge to) will conceal some of the uncertainty and if we have the computational power we can run some senstivity analysis to get a better picture.

Explaining the sensitivity by clustering

When you have no data, sensitivity analysis is not an after-thought: it’s the main result We use here clustering to explain what is causing sensitivity. The distribution on parameters generate a neighborhood of multiple “kinds” of fisheries; specific mps are better for specific kind of fisheries. You can see DLMtoolkit/MERA as translating priors on inputs(fishery parameters) to priors on the likelihood of each kind of fishery to emerge. This is helpful. Unfortunately the probability on which kind of fishery we are dealing with is very sensitive (as demonstrated above) to the shape and small changes in the prior on inputs.

We reprise the 5000 simulations that composed the original MSE numbers We cluster with a simple dendogram, euclidean distance over LTY achieved by the two policies Easily split into 4 kinds of fisheries: the easy ones, the hard ones, the ones better suited by matlenlim and the ones better suited by LTY.

Four tile plots where each row represents a management procedure, each column a separate simulation and the color of each cell represents the LTY (long term yield) achieved by that management procedure in tha simulation, red is low, white is medium and blue is high. The 5000 simulations that make up the MSE run have been split into four groups by hierarchical clustering. Group 1 represents runs that tend to be easy for both policies, group 2 those where `Itarget1` performs best, group 3 those for which `matlenlim` performs better and group 4 those that are hard for both to manage

Figure 6: Four tile plots where each row represents a management procedure, each column a separate simulation and the color of each cell represents the LTY (long term yield) achieved by that management procedure in tha simulation, red is low, white is medium and blue is high. The 5000 simulations that make up the MSE run have been split into four groups by hierarchical clustering. Group 1 represents runs that tend to be easy for both policies, group 2 those where Itarget1 performs best, group 3 those for which matlenlim performs better and group 4 those that are hard for both to manage

As the figure shows, there are four kinds of fisheries within the neighborhood described the DLM priors. One easy group, one hard group and two groups better served by a specific policy The numbers we present in MERA tables are then just a transformation of the guessed probability of facing each kind of fishery. When we vary the bounds or the triangles in the previous sections, we change the relative sizes of these clusters, quite dramatically. Or in another words, in the sensitivity analysis step we are changing the relative probability of facing one kind of fishery versus another.

A second numerical example

Not every sensitivity analysis results in reversals, which is why it’s important to do them rather than worrying about it. Same OM as before, same technique but here we focus on TAC policies alone.

Choosing among adaptive TAC procedures Itarget1, Ltarget1, Islope1, LstepCC1 and fixed TAC procedures DCAC and AvC. We basically see Itarget1 dominating everybody else. When we do shape-based sensitivity analysis, we still see Itarget1 pareto front dominating the competition. In this case then, we would be more convinced of the strenght of Itarget1 against all the proposed alternatives.

 The tradeoff plot between probability of not overfishing (y axis) and long term yield (x axis) between TAC-based policies. While the absolute values are sensitive to modifying the priors' shape, some semblance of ranking is maintained: DCAC and AvC policies perform the worst and Itarget1 seems to be pareto dominant under all conditions. Each dot represents the average values obtained after running the MSE simulation 500 times and thereby achieving convergence. Each separate dot in the sensitivity facets is obtained by a separate, random shock to the priors as described above. In total we produced 519 MSEs.

Figure 7: The tradeoff plot between probability of not overfishing (y axis) and long term yield (x axis) between TAC-based policies. While the absolute values are sensitive to modifying the priors’ shape, some semblance of ranking is maintained: DCAC and AvC policies perform the worst and Itarget1 seems to be pareto dominant under all conditions. Each dot represents the average values obtained after running the MSE simulation 500 times and thereby achieving convergence. Each separate dot in the sensitivity facets is obtained by a separate, random shock to the priors as described above. In total we produced 519 MSEs.